Job Talk

https://lilykoff.github.io/job_talk2025

Lily Koffman

Outline

Outline

Outline

Outline

Digital fingerprinting with accelerometry data

Digital fingerprinting with accelerometry data

Digital fingerprinting with accelerometry data

Big picture method: time series to scalar predictors

Each row in X is a second of data

Fit n regression models (one vs. rest)

Fit n regression models (one vs. rest)

Fit n regression models (one vs. rest)

Details of the method

For each second and each person:

  • Obtain joint distribution of acceleration and lag acceleration for a series of lags

  • Either:

    • Obtain summaries of the joint distribution
    • Use full joint distribution directly in functional regression
  • We walk through the process for one second, person, and lag to illustrate the process

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Repeat for multiple lags

Repeat for all seconds

Repeat for all people

Fit models

Fit models

  • \(n\) models, one for each person
  • Model \(j\) predicts probability that second \(i\) is from person \(j\)
  • Max prediction across all models is the predicted person for that second
  • Models include: logistic regression w/ variable selection, lasso, random forest, XGBoost, etc.

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

\(F(\cdot, \cdot, \cdot)\): trivariate smooth function

“Fingerprints” summarize predictors for a given lag and are different across individuals

“Fingerprints” summarize predictors for a given lag and are different across individuals

The method works!

  • Applied to three datasets
    • \(30\) people, \(6\) min of walking each, outdoors
    • \(153\) people, \(2\) min of walking each, indoors
      • Repeated sessions \(1\) week to \(6\) months apart
    • \(14,000\) people who wore accelerometer for \(7\) days
      • Used segmentation algorithm to ID walking
      • Then used \(3\) min of data from each person
      • Oversampling + weighting w/ logistic regression to overcome class imbalance
  • Two train/test paradigms
    • Random: seconds from all people randomly assigned to train/test
    • Temporal: some days/sessions assigned to train, other days to test

Train/test paradigms, visualized

So what?

Detour through NHANES: open source step counting

  • National Health and Nutrition Examination Survey (NHANES)
  • Nationally representative survey of US
  • Free-living accelerometry
  • Physical activity summaries: not interpretable or translatable
  • Steps: easy to understand measure of physical activity
  • Can we accurately count steps from free-living accelerometry?

Open source step counting

Step estimates vary greatly between algorithms

Open source step counting

But all algorithms estimate decline with age

Open source step counting

Being in higher step quartile associated with lower adjusted mortality risk

Detour through NHANES: survey-weighted functional regression

Question motivated by NHANES: how are physical activity patterns associated with covariates like age, sex?

Survey-weighted functional regression

We can answer this question with function on scalar regression (FoSR):

Implementation: fast univariate inference (FUI)

\[\mathbb{E}[\mathrm{MIMS}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

Survey-weighted functional regression

But: NHANES is not a simple random sample

Are our estimates valid for population-level inference?

Survey-weighted functional regression

\(\texttt{svyfosr}\): first survey-weighted functional regression implementation in R

Digital fingerprinting with hemodynamics data

Preliminary hemodynamics work

Coarse literature targets for MAP and CVP

Preliminary hemodynamics work

MAP and CVP are not independent; they occur simultaneously

Preliminary hemodynamics work

MAP and CVP are not independent; they occur simultaneously

Preliminary hemodynamics work

MAP and CVP are not independent; they occur simultaneously

Preliminary hemodynamics work

MAP and CVP are not independent; they occur simultaneously

Fingerprinting with arterial waveform

Fingerprinting with arterial waveform

Fingerprinting with arterial waveform

  • Fit XGBoost model on 727 patients
  • Mean (SD) \(7 (1.8)\) minutes per patient, range \(3\)-\(16\) minutes
  • Obtain predictors for many different lags and cut points
  • Use predictors that are top 10 contributors to first 30 PCs (\(\approx 100\) predictors)

Fingerprinting with arterial waveform

Future Directions

  • Using changes in fingerprint (both walking and waveform) to predict changes in function
  • Designing real-time interventions based on hemodynamics patterns
  • Extending survey FoSR to longitudinal outcomes
  • Standardizing processing and analysis pipelines for wearable accelerometry

Thank you!